## Chapter 15

Some Modern Architectectures

#### RISC versus CISC

- RISC designed for speed
- CISC designed for power
- CISC attempts to minimize the semantic gap.
- CISC has complex instructions such as a block copy which executes a copy loop in microcode.
- RISC performs a block copy by executing a loop at the machine level.

Compilers often do not take advantage of complex instructions available on a CISC. The complex instruction often does match the semantics of the high-level construct. So the compiler does not use it.

```
FIGURE 15.1 a) C++ code
                  for (int x = 0; x < 10; x++)
                    cout << "hello\n";</pre>
           b) Efficient assembler code (optimal instruction set)
                  1dc 10
                  sect ; load 10 into ct register
              @LO:
                             ; body of loop
                             ; decrement ct, skip next inst if ct == 0
                  dect
                  ja @LO
            c) Inefficient assembler code (optimal instruction set)
                            ; allocate and initialize x
                             ; assume relative address of x is -1
                  push
                             ; exit test
                  ldr -1
            @LO:
                  push
                   1dc
                        10
                             ; compare x and 10
                   scmp
                           ; jump if x >= 10
                   jzop @L1
                              ; loop body
                   ldc 1
                             ; increment x
                   addr -1
                        -1
                   str
                         @T0
                   jа
                             ; deallocate x
                   dloc 1
            @L1:
```

# The SPARC is a RISC architecture.

#### Integer Registers

| - | -          |                | * *              |
|---|------------|----------------|------------------|
| 0 | %r0 %g     | o              |                  |
|   | %r1 %g     | 1              |                  |
|   | %r2 %g     | 2              | •                |
|   | %r3 %g     | з              | global registers |
|   | %r4 %g     | 4              |                  |
|   | %r5 %g     | 5              |                  |
|   | %r6 %g     | 6              |                  |
|   | %r7 %g     | 7              |                  |
| - | %r8 %o     | 0              |                  |
|   | %r9 %o     | 1              | -<br>-           |
|   | %r10 %o    | 2              |                  |
|   | %r11 %o    | •3             | out registers    |
|   | %r12 %o    | <b>)</b> 4     |                  |
|   | %r13 %o    | <b>&gt;</b> 5  |                  |
|   | %r14 %o    | 96 %sp         |                  |
|   | %r15 %c    | > <del>7</del> |                  |
|   | %r16 %l    | o              |                  |
|   | %r17 %l    | 1              |                  |
|   | %r18 %l    | 2              |                  |
| · | %r19 - %l: | 3              | local registers  |
|   | %r20 %l    | 4              |                  |
|   | %r21 %l    | 5              |                  |
|   | %r22 %l    | 6              | ·                |
| · | %r23 %l    | <b>7</b>       |                  |
| · | %r24 %i    | 0              |                  |
|   | %r25 %i    | 1              |                  |
| - | %r26 %i    | 2              |                  |
|   | %r27 %i    | з [            | in registers     |
|   | %r28 %i    | 4              |                  |
|   | %r29 %i    | 5              |                  |
|   | %r30 %i    | 6 %fp          |                  |
|   | %r31 %i    | 7              |                  |

The SPARC uses multiple sets of registers to speed up function calls and returns. Registers don't have to be saved and restored on function call and return.



Overlapping register sets make for efficient parameter passing.

The output registers of the calling function become the input registers of the called function.



Window overflow occurs when no register sets are available for a function call. A register set has to be spilled (written to main memory) to make it available for the called function.



#### WINDOW OVERFLOW



## The save and restore instructions trigger a register switch.

#### FIGURE 15.6



During the execution of a function, %sp points to its stack frame, %fp points to the calling function's stack frame.



If an item's address is passed, the item must be in memory (so that it has an address). This is why a stack frame has an area for parameters, although they are usually passed in registers.

local variables

## The spill area for a function is in its stack frame.

A window underflow occurs on return to a function whose register set was spilled (which now has to be restored from the spill area).

FIGURE 15.9



The SPARC has a *load/store* architecture. That is, only the load and store instructions access main memory.

### Code for load/store architecture

- 1. Load the first value into a register.
- 2. Load the second value into a register.
- 3. Add the two registers and place the result in a register.
- **4.** Store the result back into memory.

In contrast, in an architecture that has an add instruction for which one of the operands is in memory (like the add instruction on H1), we have to

- 1. Load the first value into a register.
- 2. Add the second value in memory to the register containing the first value.
- **3.** Store the result back into memory.

## Advantages of load/store architecture

- Makes for a smaller instruction set (we do not need all the arithmetic instructions that access main memory).
- Minimizes contention for main memory.
- More uniform instruction set which respect to execution time (this is good for pipelining).

#### Id instruction



#### Three variations of Id

```
Id [%sp], %o0
Id [%sp + %l3], %o0
Id [%sp + 68], %o0
```

**FIGURE 15.10** a) Two source registers

| 31 30 | 29 25 | 24 19 | 18 14 | 13 | 12 5 | 4 0 |  |
|-------|-------|-------|-------|----|------|-----|--|
| ор    | rd    | op3   | rs1   | 0  |      | rs2 |  |

b) One source register plus a displacement

| 31 30 | 29 25 | 24 19 | 18 14 | 13 | 12 0   |  |
|-------|-------|-------|-------|----|--------|--|
| ор    | rd    | op3   | rs1   | 1  | simm13 |  |

op:

opcode, 11 for both ld and st

rd:

destination register

op3;

opcode, 00000 for ld, 00100 for st

rs1:

source register 1

rs2:

source register 2

simm13:

13-bit signed immediate value (holds displacement for load and store instructions)

# A load must be from a word boundary (from an address divisible by 4)

```
x = *(int *)p;
```

This instruction first casts the pointer to an Int pointer and then dereferences it to access the word to which it points. However, this statement will work only if the integer is on a word boundary. If it is not, a Sun Workstation responds with the cryptic error message "Bus Error." This error message occurred when lin (the H1 linker) was ported from a Pentium machine to the Sun SPARC Workstation. The Pentium does not require any boundary alignments for its load instructions. Thus, it does not object to the previous statement (which appeared in the C++ code for 1in). To fix this problem, the C++ statement was replaced with

```
memcpy(x, p, sizeof(int));
```

#### st instruction

```
st \$00, [\$sp] ! mem[\$sp] = \$00;
st \$00, [\$sp + \$13] ! mem[\$sp + \$13] = \$00;
st \$00, [\$sp + 68] ! mem[\$sp + 68] = \$00;
```

#### or instruction

```
orcc %g0, %i3, %i5 ! sets condition code or %g0, %i3, %i5 ! does not set condition code
```

## Id and or have essentially the same format

**FIGURE 15.11** 

Instruction

Machine Code

| THISTIUCTION      | Placiffic Code |       |             |       | ' .      |               |
|-------------------|----------------|-------|-------------|-------|----------|---------------|
| -                 | 31 30          | 29 25 | 24 19       | 18 14 | 13       | 12 0          |
|                   | op             | rd    | op3         | rs1   | 1        | simm13        |
| ld [%i0 + 4], %o0 | 11             | 01000 | 000000      | 11001 | 1        | 0000000000100 |
| or %i0, 4, %o0    | 10             | 01000 | 000010      | 11001 | 1        | 0000000000100 |
|                   |                |       | - %o0 (%r8) |       | - %i0 (% | or25)         |

To load from a symbolic label, must first load the address into a register, then use a ld instruction. To load the address of a label into a register, use a sethi-or sequence.

sethi %hi(x), %i0or %i0, %lo(x), %i0

! load high 22 bits into %i0

! or 10 low bits into %i0

.section ".data"

.align 4

.word 7

X:

! data section

#### Getting the address of x.

The code for

```
p = &x;
is
                              ! get address of x
sethi %hi(x), %i0
or %i0, %lo(x), %i0
                              ! store address in p
sethi %hi(p), %i1
      %i0, [%i1 + %lo(p)]
st
```

#### Loading a 32-bit constant

```
sethi %hi(0x12345678), %o0
     %00, %10(0x12345678), %00! or in low 10 bits
or
```

! load high 22 bits

FIGURE 15.13 a) sethi

31 30 29 25 24 22 21 0
op rd 100 22 bit immediate field

(high 22 bits of address of x)



The 64-bit SPARC machines have an additional set of condition codes, xcc, that are set according to the results of 64-bit operations. icc is set according to the results of 32-bit operations. The 64-bit SPARC machines have branch instructions that test either icc or xcc. These are the new branch instructions. The old branch instructions test only icc.

The call instruction has a 30-bit displacement fields that allows a transfer to any word address.

#### **Branch instructions**

```
ba xxx ! old, use icc ba %icc, xxx ! new, use icc ba %xcc, xxx ! new, use xcc
```

FIGURE 15.14 Branch instructions

| Opcode | Cond      | Mnemonic | Name (test)                                      |
|--------|-----------|----------|--------------------------------------------------|
| 00     | 1000      | ba       | Branch always                                    |
| 0.0    | 0000      | bn       | Branch never                                     |
| 00     | 0001      | be       | Branch equal (Z == 1)                            |
| 00     | 0010      | ble      | Branch less or equal $((N ^ V) == 1     Z == 1)$ |
| 00     | 0100      | bleu     | Branch less or equal unsigned (C == 1    Z == 1) |
| 00     | 0011      | bl       | Branch less $((N ^ V) == 1)$                     |
| 00     | 0101      | bcs      | Branch carry set (C == 1)                        |
|        | · · · · · | blu      | Branch less unsigned (C == 1)                    |
| 00     | 0110      | bneg     | Branch negative (N == 1)                         |
| 00     | 1001      | bne      | Branch not equal $(Z == 0)$                      |
| 00     | 1010      | , bg     | Branch greater ( $Z == 0 \&\& N \land V == 0$ )  |
| 00     | 1100      | bgu      | Branch greater unsigned (Z == 0 && C == 0)       |
| 00     | 1011      | bge      | Branch greater or equal (N ^ V == 0)             |
| 00     | 1101      | bcc      | Branch carry clear (C == 0)                      |
|        |           | bgeu     | Branch greater or equal unsigned (C == 0)        |
| 00     | 1110      | bpos     | Branch positive $(N == 0)$                       |
| 00     | 0111      | bvs      | Branch signed overflow (V == 1)                  |
| 00     | 1111      | bvc      | Branch no signed overflow (V == 0)               |

FIGURE 15.15 a) Old branch instruction

| 31 | 30 | 29 | 28 | 25 | 24  | 22 | 21     | ٥ |
|----|----|----|----|----|-----|----|--------|---|
| 0  | 0  | а  | со | nd | 010 |    | disp22 |   |

b) New branch instruction

| 31 | 30 | 29 | 28 2 | 25 | 24 | 22 | 21 | 20 | 19 | 18 | 0      |   |
|----|----|----|------|----|----|----|----|----|----|----|--------|---|
| 0  | 0  | a  | cond |    | 00 | )1 | C  | :C | р  |    | disp19 | ı |

c) Call instruction

| 31 | 30 | 29     |  |
|----|----|--------|--|
| 0  | 1  | disp30 |  |

a: annul bit

cond: condition

cc: condition code set (00: icc, 10: xcc)

p: prediction bit

disp19: 19-bit signed word displacement

disp22: 22-bit signed word displacement

disp30: 30-bit signed word displacement

#### Comparing using subcc

```
subcc %i0, %i1, %g0 bl xxx
```

#### Instruction pipelining

Same principle as a car assembly line. A car is built in stages. An instruction is executed in stages.

A four-stage pipeline can increase execution rate by a factor of 4.

#### Stages of our pipeline

- Fetch (fetch instruction)
- Decode (decode opcode, access operands, compute effective address)
- Execute (Perform ALU operation)
- Write back (Update registers)

**FIGURE 15.16** a) Snapshots of a four-stage pipeline



#### b) Instruction vs. time representation

|             |              |    | Time       |            |            |
|-------------|--------------|----|------------|------------|------------|
|             |              | 0  | 1          | 2          | 3          |
|             | I1           | S1 | S2         | <b>S</b> 3 | <b>S</b> 4 |
|             | <b>I</b> 2   |    | <b>S</b> 1 | <b>\$2</b> | · S3       |
| Instruction | <b>I3</b>    |    |            | S1         | S2         |
|             | · <b>I</b> 4 |    |            |            | <b>\$1</b> |

#### Data dependency

One instruction needs the data provided by another instruction

How is this sequence handled by the pipeline? Note the data dependency of the **or** on the **Id**.

```
ld [%10], %00
ld [%11], %01
or %00, %01, %02
```

# Wrong--what about the data dependency between Id and or?

**FIGURE 15.17** a)



## What really happens—pipeline stalls



(continued)

#### Bubble appears in pipeline because of stall



### Control dependency

The availability of one instruction depends on another.

### Some useful terminology

- 1. The *control instruction* (an instruction that conditionally or unconditionally transfers control to a new location). Control instructions include the branch, call, and jmpl instructions.
- 2. The fall-through instruction (the instruction that physically follows the control instruction in memory).
- 3. The target instruction (the instruction at the location to which the control instruction transfers control if the transfer occurs).

#### Delayed branching

Allow the fall-through instruction to execute whether or not the transfer of control occurs.

FIGURE 15.18 a) Without delayed branching

Time

control instruction fall-through instruction target instruction next instruction next instruction next instruction next instruction





#### To pass 5 in %o0 to sub

```
call sub_ii
or %g0, 5, %o0 ! load 5 into %o0
```

or completes before transfer of control

Only the transfer of control is delayed on delayed branching. In all other respects, the delay-slot instruction executes after the control instruction. Thus, you **cannot** reorder the first sequence to the second sequence below.

```
[%10], %02
                          ! compare %00 with %01
subcc %00, %01, %g0
                          ! branch based on result of subcc
      XXX
to
      [%i0], %02
ld
                          ! branch based on result of subcc
bl
      XXX
                          ! compare %00 with %01
subcc %00, %01, %g0
```

But the reordering below is ok because the bl instruction does not depend on the ld instruction.

```
subcc %00, %01, %g0 ! compare %00 with %o1
bl xxx ! branch based on result of subcc
ld [%i0], %o2
```

# If you can't move any instruction into the delay slot, use nop

```
ld [%10], %00
ld [%11], %01
subcc %00, %01, %g0 ! depends on ld instruction
bl xxx ! branch based on result of compare
nop ! fill in delay slot with nop
```

## Calling sequence corresponding to sub(1,2);

```
sub(1,2);
is

or %g0, 1, %o0 ! load 1 into %o0
call sub_ii
or %g0, 2, %o1 ! load 2 into %o1
```

When the call instruction is executed, it places its own address in %o7, which becomes %i7 in the called function. Thus, the return address is given by

$$\%i7 + 8$$

The "+ 8" is needed to skip over the call and delay slot instructions. The instruction

in the called function jumps to the return address.

### Linkage instructions



You should not use %i7 as shown below because the register switch has already occurred when the jmpl instruction is executed.

```
restore
jmpl %i7 + 8, %g0 ! accessing caller's %i7
nop
```

#### Correct return sequences

```
restore

jmpl %07 + 8, %g0 ! accessing caller's %o7

nop
```

or

```
jmpl %i7 + 8, %g0
restore ; switch after jmpl so use %i7
```

#### Addressing modes

| FI | GII | RE | 15 | .20 |
|----|-----|----|----|-----|
| 11 | v   |    | ** | 164 |

Addressing mode

Example

register direct/immediate

add %00, 5, %02

register direct

add %00, %01, %02

memory direct

ld [40], %o0

register indirect

ld [%i0], %o0

register indirect with displacement

ld [%i0 + 8], %o0

register indirect with indexing

ld [%i0 + %i1], %o0

## Using an index register

```
! assume address of table is in %i0
FIGURE 15.21
                                                   ! zero out index reg
                              %g0, %g0, %i1
                        or
                                                   ! set count to 10
                             %g0, 10, %i2
                        or ·
                                                   ! access element from table
                              [%i0 + %i1], %o0
                        ld
              5 loop:
                                                    ! decrement count
                              %i2, 1, %i2
                        subcc
                                                    ! branch if count not zero
                               loop
                        bne
             10
                                                   ! add 4 to index reg
                               %i1, 4, %i1
                        add
             11
```

#### Assembler directives

- .section (lines 1, 28, 34) specifies the type of section that follows.
- .global (line 2) flags the listed identifiers as global.
- .asciz (line 29) creates a null-terminated ASCII string.
- .align (line 30) forces a boundary alignment.
- .word (line 31) defines a word.
- .skip (line 36) reserves the indicated number of bytes.

The program on the next slide illustrates the use of these directives.

#### **FIGURE 15.22**

```
1
          .section ".text"
                                      ! text section
2
           .global main
3
   main:
                                      ! create called functions frame
           save %sp, -96, %sp
4
                                      ! and switch regs
5
7
           sethi %hi(x), %17
                                      ! get x
           1d [\$17 + \$1o(x)], \$o0
8
9
10
           sethi %hi(y), %17
                                      ! get y
11
           1d [%17 + %1o(y)], %o1
12
13
           add %00, %01, %00
                                      ! add x and y
14
15
           sethi %hi(sum), %17
                                      ! store result in sum
16
                \$00. [\$17 + \$10(sum)]
17
           sethi %hi(cs), %o0
                                     ! get high part of address of cs
18
                %00, %10(cs), %00
                                      ! get low part of address of cs
19
           or
20
                                     !get high part of address of sum
21
           sethi %hi(sum), %17
22
           call printf
                                     ! load sum into % 01
23
           ld [%17 + %lo(sum)],%o1
24
                                      ! return to caller
25
           jmp1 %i7 + 8, %g0
                                      ! switch to caller's regs
26
           restore
   27
                                      ! initialized data section
28
          .section ".data"
29
           .asciz "sum = %d\n"
                                      ! null-terminated string
   cs:
                                      ! force full-word boundary
30
           .align
                   4
                                      ! integer word
31
   \mathbf{x}:
           .word 1
           .word
                   15
                                      ! integer word
32
   y:
33
   .section ".bss"
                                    ! uninitialized data section
34
35
           .align
36
           .skip
   sum:
```

# Termination sequence (an alternative to jmpl-restore)

```
or %g0, 1, %g1 ! load 1 (terminate program) into %g1 trap to operating system
```

A synthetic instruction is shorthand for another instruction. For example,

cmp %i0, %i1

is the synthetic instruction for

subcc %00, %i1, %g0

### FIGURE 15.23

|   | Synthetic Instruction    | Real Instruction                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Function      |
|---|--------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
| • | btst reg1/immed, reg2    | andcc reg1, reg2/immed, %g0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | bit test      |
|   | bset reg1/immed, reg2    | or reg1, reg2/immed, reg1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | bit set       |
|   | bclr reg1/immed, reg2    | andn reg1, reg2/immed, reg1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | bit clear     |
|   | btog reg1/immed, reg2    | xor reg1, reg2/immed, reg1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | bit toggle    |
| • | clr reg                  | or %g0, %g0, reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | clear         |
|   | clr [address]            | st %g0, [address]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | clear         |
|   | clrh [address]           | sth %g0, [address]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | clear half    |
|   | clrb [address]           | stb %g0, [address]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | clear byte    |
|   | cmp reg1, reg2/immed     | subcc reg1, reg2/immed, %g0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | compare       |
|   | dec reg                  | sub reg, 1, reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | decrement     |
|   | dec immed, reg           | sub reg, immed, reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | decrement     |
|   | deccc reg                | sub reg, 1, reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | decrement     |
|   | deccc immed, reg         | subcc reg, immed, reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | decrement     |
|   | inc reg                  | add reg, 1, reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | increment     |
|   | inc immed, reg           | add reg, immed, reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | increment     |
|   | inccc reg                | addcc reg, 1, reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | increment     |
|   | inccc immed, reg         | addcc reg, immed, reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | increment     |
| ÷ | mov reg1/immed, reg2     | or %g0, reg1/immed, reg2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | move          |
|   | neg reg                  | sub %g0, reg, reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | negate        |
|   | nop                      | sethi 0, %g0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | no operation  |
|   | ret                      | jmpl %i7 + 8, %g0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | return        |
|   | retl                     | jmpl %07 + 8, %g0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | ret from leaf |
|   | If 4096 <= value < 4096  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |               |
|   | set value, reg           | or %g0, value, reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | set to value  |
|   | If value $\& 0x3ff == 0$ |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |               |
|   | set value, reg           | sethi %hi(value), reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | set to value  |
|   | If value neither         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |               |
|   | set value, reg           | sethi %hi(value), reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | set to value  |
|   |                          | or reg, %lo(value), reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | ٠.            |
|   |                          | the production of the second section is a second section of the section |               |
|   |                          | t in the second of the second  |               |

## Be careful how you use synthetic instructions

```
call sub set 0x12345678, %o0
```

### This sequence, in reality, is

```
call sub
sethi %hi(0x12345678), %o0
or %o0, %lo(0x12345678), %o0
```

# Structural analysis is important for code optimization

- Suppose a compiler determines that m is used in only the first half of a program, and n in only the second half. Then the compiler can map m and n to the same register. If, instead, it mapped m and n to different registers, there would be one less register available to hold other variables.
- Suppose m and n are used throughout a program, but an analysis reveals that m will be accessed more often than n. If only one register is available, the compiler should map m, not n, to that register.
- Suppose the following sequence appears in a program:

```
a = p -> q -> r -> x;

cout << a;

c = p -> q -> r -> y;
```

Does p -> q -> r have to be computed twice? No, in this case.

Examples of C++ programs and their compiler-generated SPARC assembly code follow.

```
FIGURE 15.24 a) C++ code
                 1 int gv1, gv2 = 5;
                 2 int fa(int x, int y, int z)
                 4 return x + y + z;
                 5 }
                 6 int main()
                 8 int 1v1, 1v2 = 7;
                 10 	 1v1 = 11;
                 11 gv1 = fa(gv2, lv1, lv2);
                 12
                 13 return 0;
                14 }
```

#### FIGURE 15.24 (continued)

```
b) SPARC code
     1
                 .section ".text"
     2
                 .global fa_iii
     3
     4 fa_iii:
                 add
                          %00, %01, %00
                                                ! return x + y + z;
                 ret1
     5
                 add
                         800, 802, 800
                 .global
                         main
     9 main:
                          %sp, -96, %sp
                 save
    10
    11
                          hi(gv2), %00 ! gv1 = fa(gv2, 1v1, 1v2);
                 sethi
    12
                 1d
                          [%00+%lo(gv2)], %00
    13
                 mov
                          11, %01
                                                ! 1v1 = 11;
                 call
    14
                          fa_iii
    15
                 mov
                          7, %02
                                               ! int 1v2 = 7;
    16
                          %hi(gv1), %o1
                 sethi
                                               ! store ret value in gv1
    17
                          %00,[%01 + %lo(gv1)]
                 st
    18
    19
                 ret
    20
                 restore %g0, 0, %o0
    22
                 .section ".data"
    23
                 .global gv2
                                                ! int gv2 = 5;
    24
                 align 4
    25 gv2:
                 word
    27
                 .section ".bss"
    28
                 .global
                          gv1
    29
                 .align
                          4
    30 gv1:
                .skip 4
```

## **FIGURE 15.25** a) 1 int gv; 2 int fb(int t, int u, int v, int w, int x, int y, int z) return t + u + v + w + x + y + z; 6 int main() gv = fb(1, 2, 3, 4, 5, 6, 7);return 0;

```
.global fb_iiiiiii
 3 fb_iiiiiii:
5
                add
                         800, 801,
                                     800
                                           !return t + u + v + w + x + y + z;
 6
                add
                         800, 802,
                                     800
                add
                         800, 803,
                                     800
 8
                         800, 804,
                add
                                     800
 9
                add
                         %o0, '%o5,
                                     800
 4
                ld
                         [%sp+92],
                                     %g1
10
                retl
11
                add
                       %00, %g1,
                                     800
12
13
                .global main
14 main:
                        %sp, -96, %sp
                save
15
16
                         1, %00
                mov
                                           ! gv = g(1, 2, 3, 4, 5, 6, 7);
17
                         2, %01
                mov
18
                         3, %02
                mov
19
                         4, %03
                MOV
20
                         5, %04
                mov
21
                         6, %05
                mov
22
                         7, %g1
                mov
23
                call
                         fb_iiiiiii
24
                         %g1, [%sp+92]
                st
25
26
                ret
                                           ! return 0;
27
                restore %g0, 0, %o0
```

.section ".text"

1

### **FIGURE 15.26** a)

```
1 void fc(int *p)
         = 99;
5 int main()
      int lv;
      fc(&lv);
      return 0;
```

```
b)
                .section ".text"
   1
                .global fc_pi
   2
                                     ! *p = 99;
                    99, %g1
   3 fc_pi:
                mov
                retl
  4
                st %g1, [%o0]
   5
   6
   7
                .global main
                save %sp, -96, %sp
   8 main:
   9
                                       ! fc(&lv);
                call fc_pi
  10
                      %sp, 92, %o0
                add
  11
  12
                                       ! return 0;
  13
                ret
                restore %g0, 0, %o0
  14
```

### **FIGURE 15.27**

```
a)
   1 int *gpv;
   2 void fnull()
   3 {}
   4 void fd(int x)
   5 {
      gpv = &x;
        *gpv = x + 3;
        fnull();
                       // forces fd to save/restore
  10 int main()
  11 {
        int 1y = 7;
  12
  13
  14
        fd(1v);
  15
  16
        return 0;
  17 }
```

```
b)
                   .section ".text"
                   .global fnull_v
   3 fnull_v:
                   retl
   4
                   nop '
   5
   .6
                   .global
                           fd_i
                            %sp, -96, %sp
   7 fd_i:
                   save
   8
   9
                   sethi
                           %hi(gpv), %o1
                                                   ! gpv = &x;
                   add
                            %fp, 68, %o0
  10
                             %00, [%01+%lo(gpv)]
  11
                   st
  12
                            %i0, %o0
  13
                                                    ! *gpv = x + 3;
                   mov
                             800, 3, 800
  14
                  add
  15
                   call
                             fnull_v
                                                    ! fnull();
  16
                   st
                             %00, [%fp+68]
  17
  18
                   ret
  19
                   restore
  29
  20
                   .global main
                           %sp, -96, %sp
  21 main:
                   save
  22
                                                                     (continued)
```

### FIGURE 15.27 (continued)

```
! fd(lv);
                           fd_i
                 call
23
                          7, 800
24
                mov
25
26
                 ret
27
                 restore
28
                 .section ".bss"
29
                 .global
30
                          gpv
                 .align
31
32 gpv:
                 .skip
```

## Memory-mapped I/O on SPARC

```
set 0xfffffff0, %00
mov 1, %i0
stb %i0, [%00] ! send read command
```

To get status, we would use

```
ldb [%00 + 1], %i0 ! get status
```

When the status byte indicates that data is available, we would then read the data with

```
ldb [%00 + 2], %i0 ! get data
```

# The Pentium is a CISC architecture

Principal Registers on the Pentium

|        |   | <i>I</i>                              | ΑX   |
|--------|---|---------------------------------------|------|
| EAX    |   | АН                                    | AL   |
|        |   | I                                     | 3X   |
| EBX    |   | ВН                                    | BL   |
|        |   |                                       | CX   |
| ECX    |   | CH                                    | CL   |
|        |   | I                                     | ΟX   |
| EDX    |   | DH                                    | DL   |
| EBP    |   |                                       | BP . |
| EDI    |   |                                       | DI   |
| ESI    |   |                                       | SI   |
| ESP    | · |                                       | SP   |
| EIP    |   | · · · · · · · · · · · · · · · · · · · | IP . |
| EFLAGS |   | FI                                    | AGS  |

# mov instructions Direction is right to left

```
mov eax, 3; load eax with 4-byte integer 3; load ax with 2-byte integer 3; mov eax, [ebp-8]; load eax from memory; mov [ebp-8], eax; store eax into memory
```

## Length of target field is obvious in these instructions.

```
mov [ebp-8], eax ; store 4 bytes mov [ebp-8], ax ; store 2 bytes
```

Length of target field is not obvious in this instruction.

Must disambiguate by specifying the length of the target field.

### How to disambiguate

```
mov byte ptr [ebp-8], 7
```

stores a single byte containing 7 into the location given by [ebp-8]. Similarly,

```
mov word ptr [ebp-8], 7
```

stores a word containing 7, and

```
mov dword ptr [ebp-8], 7
```

stores a doubleword containing 7.

## push-mov sequence just like esba instruction in H1

```
push ebp
mov ebp, esp
```

; save ebp on the stack

mov ebp, esp ; move in called fn's frame address

# mov-pop sequence just like reba in H1

mov esp, ebp pop ebp

#### **FIGURE 15.29**

```
Intel Pentium code
 1 .code
               public @fa$iii
 2
                        ebp
   @faSiii:
               push
 4
               mov
                        ebp, esp
 5
                                                 ; return x + y + z;
                        eax, [ebp+8]
 6
                mov
                              [ebp+12]
                add
                        eax,
 7
                              [ebp+16]
                add
                        eax,
 9
                pop
                        ebp
10
                ret
11
12
                public main
13
                        ebp
14 main:
                push -
                         ebp, esp
                wov
1.5
16
                                                  ; int 1v1, 1v2 = 7;
                         esp, -8
 17
                add
                         dword ptr [ebp-8], 7
 18
                wow
 19
                         dword ptr [ebp-4], 11; lv1 = 11;
 20
                wow
 21
                                                  ; gv1 = fa(gv2, 1v1, 1v2);
                         dword ptr [ebp-8]
                oush
 22
                         dword ptr [ebp-4]
                 push
 23
                         dword ptr [gv2]
                 push
 24
                 call
                         @faSiii
 25
                         esp,12
                 add
 26
                          [gv1], eax
 27
                 mov
 28
                                                   ; return 0;
                         eax, eax
                 xor
 29
                         esp, ebp
 30
                 mov
                          ebp .
                 qoq
  31
                 ret
  32
  34 .data
                 public gv1
  35
                                                   ; reserve 4 bytes
                  đđ
  36 gv1
                 public gv2
 - 37.
                                                   ; define constant 5
                          5
                  đđ
  38 gv2
```

**FIGURE 15.30** 

a) Before push/mov sequence (right after call instruction)



### b) After push/mov sequence



Use addresses relative to ebp to access parameters and local variables. For example,

mov eax, [ebp+8]

The Pentium uses I/O instructions rather than memory-mapped I/O.

### FIGURE 15.31 I/O Instructions on the Pentium

|     | IN  | AL,DX                   |
|-----|-----|-------------------------|
|     | IN  | AX, DX                  |
|     | IN  | EAX,DX                  |
|     | IN  | AL, <8-bit port number> |
|     | IN  | AX,<8-bit port number>  |
|     | IN  | EAX,<8-bit port number> |
|     |     |                         |
|     | OUT | DX,AL                   |
|     | OUT | DX,AX                   |
|     | OUT | DX, EAX                 |
| . • | OUT | <8-bit port number>,AL  |
|     | OUT | <8-bit port number>,AX  |
|     | OUT | <8-bit port number>,EAX |
|     |     | •                       |

**FIGURE 15.32** 





IN AL, DX ; read one byte

reads 1 byte (because AL is a 1-byte register) from the I/O device whose port number (i.e., address) is in DX, but

IN AX,DX ; read two bytes

reads 2 bytes (because **AX** is a 2-byte register). I/O instructions can specify the port number in two ways: directly with an 8-bit constant, or indirectly with the DX register. For example, in the instruction

IN AL,61H ; port number address is 61H

the port number is specified directly, but in

IN AL, DX ; port number is in DX register

it is specified indirectly. With the direct approach, the port number is limited to 8 bits; with the indirect approach it is limited to 16 bits. Figure 15.31 summarizes the I/O instructions on the Pentium.